Protein Classification via Kernel Matrix Completion
نویسندگان
چکیده
The three-dimensional structure of a protein provides crucial information for predicting its function. However, as it is still a far more difficult and costly task to measure 3D coordinates of atoms in a protein than to sequence its amino acid composition, often we do not know the 3D structures of all the proteins at hand. Let us consider a kernel matrix that consists of kernel values representing protein similarities in terms of their 3D structures where some of the entries are missing because structural information about some proteins are not available whereas their amino acid sequences are readily available. This chapter proposes to estimate the missing entries by means of another kernel matrix derived from amino acid sequences. Basically, a parametric model is created from the sequence kernel matrix, and the missing entries of the structure’s kernel matrix are estimated by fitting this model to existing entries. For model fitting, we adopt two algorithms: single e-projection and em algorithm based on the information geometry of positive definite matrices. For evaluating and demonstrating the performance of our method, we performed protein classification experiments by using support vector machines (SVMs). Our results show that these algorithms can effectively estimate the missing entries.
منابع مشابه
Mutual Kernel Matrix Completion
With the huge influx of various data nowadays, extracting knowledge from them has become an interesting but tedious task among data scientists, particularly when the data come in heterogeneous form and have missing information. Many data completion techniques had been introduced, especially in the advent of kernel methods. However, among the many data completion techniques available in the lite...
متن کاملSupport vector machine training using matrix completion techniques
We combine interior-point methods and results from matrix completion theory in an approximate method for the large dense quadratic programming problems that arise in support vector machine training. The basic idea is to replace the dense kernel matrix with the maximum determinant positive definite completion of a subset of the entries of the kernel matrix. The resulting approximate kernel matri...
متن کاملMemory-efficient Kernel PCA via Partial Matrix Sampling and Nonconvex Optimization: a Model-free Analysis of Local Minima
Kernel PCA is a widely used nonlinear dimension reduction technique in machine learning, but storing the kernel matrix is notoriously challenging when the sample size is large. Inspired by [YPCC16], where the idea of partial matrix sampling followed by nonconvex optimization is proposed for matrix completion and robust PCA, we apply a similar approach to memoryefficient Kernel PCA. In theory, w...
متن کاملMulti-view Weak-label Learning based on Matrix Completion∗
Weak-label learning is an important branch of multi-label learning; it deals with samples annotated with incomplete (weak) labels. Previous work on weak-label learning mainly considers data represented by a single view. An intuitive way to leverage multiple features obtained from different views is to concatenate the features into a single vector. However, this process is not only prone to over...
متن کاملLow-rank Matrix Recovery via Iteratively Reweighted Least Squares Minimization
We present and analyze an efficient implementation of an iteratively reweighted least squares algorithm for recovering a matrix from a small number of linear measurements. The algorithm is designed for the simultaneous promotion of both a minimal nuclear norm and an approximatively low-rank solution. Under the assumption that the linear measurements fulfill a suitable generalization of the null...
متن کامل